منابع مشابه
Focused Web Crawling Algorithms
Nowadays the web is rich of any kind of information. And this information is freely available thanks to the hypermedia information systems and the Internet. This information greatly influenced our lives, our lifestyle and way of thinking. A web search engine is a complex multi-level system that helps us to search the information that available on the Internet. A web crawler is one of the most i...
متن کاملFocused Web Corpus Crawling
In web corpus construction, crawling is a necessary step, and it is probably the most costly of all, because it requires expensive bandwidth usage, and excess crawling increases storage requirements. Excess crawling results from the fact that the web contains a lot of redundant content (duplicates and near-duplicates), as well as other material not suitable or desirable for inclusion in web cor...
متن کاملLanguage Specific and Topic Focused Web Crawling
We describe an experiment on collecting large language and topic specific corpora automatically by using a focused Web crawler. Our crawler combines efficient crawling techniques with a common text classification tool. Given a sample corpus of medical documents, we automatically extract query phrases and then acquire seed URLs with a standard search engine. Starting from these seed URLs, the cr...
متن کاملOntology Driven Focused Crawling of Web Documents
In recent year dynamism of the World Wide Web , the issue of discovering relevant web pages has become an important challenge. Focused crawler aims at selectively seeking out pages that are relevant to a pre-defined set of topics. Most of the current approaches perform syntactic matching, that is, they retrieve documents that contain particular keywords from the user’s query. This often leads t...
متن کاملEvolving Strategies for Focused Web Crawling
The rapid growth of the World Wide Web has created many challenges for both general purpose crawling, search engines and web directories, making it difficult to find, index, and classify web pages based on a topic. Topic driven crawlers can complement search engines because they pre-classify the pages retrieved by the crawl. To implement such a focused crawler, a strategy for ordering the crawl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: ACM Transactions on the Web
سال: 2014
ISSN: 1559-1131,1559-114X
DOI: 10.1145/2644821